reviewer 3
Paper: Generalization of Reinforcement Learners with Working and Episodic Memory
We thank the reviewers for their thoughtful and constructive feedback on our manuscript. This should help both contextualize each task's difficulty and illustrate what it involves. Reviewer 3 noted the Section 2 task descriptions could be better presented. We have reformatted it so that "the order We also changed our description of IMP ALA to match Reviewer 5's suggestion. Regarding the task suite, Reviewer 4 raised a thoughtful consideration on whether "most of the findings translate when Some 3D tasks in the suite already have '2D-like' semi-counterparts that do not require navigation, '2D-like' because everything is fully observable and the agent has a first-person point of view from a fixed point, without Spot the Difference level, was overall harder than Change Detection for our ablation models.
would like to emphasize that we compared with over 1056 benchmarks arising from the domain of neural network 3 verification. 4 Reviewer 3
We are deeply appreciative of the reviewers for their feedback amidst these trying circumstances. To summarize, DeWeight is indeed the state of the art technique for benchmarks with large tilt. To the best of our knowledge, we are not aware of any practical applications of discrete integration that have small tilt. As mentioned on line 219, we tested our tool on 1056 formulas arising from the domain of neural network verification. These formulas evaluate robustness, trojan attack effectiveness, and fairness of a binarized neural network.
Reviewer 1: We will be sure to provide a more accurate and nuanced discussion of the downsides of our auxiliary
We thank all reviewers for their constructive and helpful comments. Reviewer 1: Regarding runtime evaluation, what we called the "wall clock time" is the sum of the GPU time and the CPU time, and the reported time to "run the neural net on its own" is the GPU time. We will revise our paper to include this discussion. We have filled in this gap in the literature for flow models. ANS for autoregressive models, which are slow for decoding.
e5b294b70c9647dcf804d7baa1903918-AuthorFeedback.pdf
We appreciate the careful reviews! W TS (T, y), this could be added to Theorem 2. We don't have a suboptimality analysis for I We believe that our algorithms have other advantages over IDS . Moreover, IDS's computational complexity is IRS policy is recursive like TS: i.e., the decision at a certain moment depends only on the posterior distribution and the We absolutely agree with the fact that the stochastic MAB with independent arms has already been studied extensively.